Skip to content

Conversation

@kaovilai
Copy link

@kaovilai kaovilai commented Nov 13, 2025

Summary

This PR refactors the Dockerfile to enable multiarch support and build the csi-snapshot-metadata binary from source in the builder stage.

Changes

1. Multiarch grpc_health_probe support

  • Add TARGETARCH build argument
  • Use grpc_health_probe-linux-${TARGETARCH} instead of hardcoded amd64
  • Add --platform=$BUILDPLATFORM to builder stage for cross-compilation

2. Build csi-snapshot-metadata in builder stage

Dockerfile changes:

  • Use golang:1.24-alpine as builder base (was alpine)
  • Add TARGETOS and LDFLAGS build arguments
  • Copy Go source files (go.mod, go.sum, cmd/, pkg/, client/, vendor/)
  • Build binary in builder with cross-compilation support
  • Copy built binary from builder (was copying from host filesystem)
  • Remove obsolete binary ARG

Makefile changes (release-tools/build.make):

  • Replace --build-arg binary with --build-arg LDFLAGS
  • Remove build-% dependency from push-multiarch-% target
    (Docker now builds from source, eliminating duplicate compilation)
  • Fix manifest error detection to handle both error formats:
    • "manifest for ... not found" (docker.io)
    • "manifest unknown" (ghcr.io, quay.io)

Benefits

  • ✅ Self-contained builds (no external Makefile dependency for Docker)
  • ✅ True cross-compilation via Docker buildx
  • ✅ Consistent approach (both binaries prepared in builder stage)
  • ✅ Easier for contributors (just docker build or podman build)
  • ✅ Supports linux/amd64 and linux/arm64 platforms

Testing

Tested with multiarch build to ghcr.io:

make push-multiarch-csi-snapshot-metadata \
  REGISTRY_NAME=ghcr.io/kaovilai \
  PULL_BASE_REF=test-20251119-014146 \
  BUILD_PLATFORMS="linux amd64 amd64; linux arm64 arm64"

Verified multiarch manifest contains both architectures:

  • linux/amd64: sha256:38a12212d86d92c20fb5762c63846bf8290790ed048b2622d9550cac9334416e
  • linux/arm64: sha256:e01b090c4db6124e802e73ee034632bb98cdc3b7c6dc418cb5c03e8cf785418a

🤖 Generated with Claude Code

Co-Authored-By: Claude [email protected]


What type of PR is this?
/kind feature

What this PR does / why we need it:
Enables building multiarch container images (amd64, arm64) with the csi-snapshot-metadata binary built from source in the Dockerfile builder stage, making the build process self-contained and easier for contributors.

Which issue(s) this PR fixes:
N/A

Special notes for your reviewer:

  • The Dockerfile now builds from source instead of relying on pre-built binaries from the Makefile
  • The Makefile changes are backward compatible - make build-csi-snapshot-metadata still works for local development
  • Tested successfully with both docker and podman

Does this PR introduce a user-facing change?:

Container images now support multiple architectures (amd64, arm64) via multiarch manifests

@k8s-ci-robot k8s-ci-robot added the do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. label Nov 13, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: kaovilai
Once this PR has been reviewed and has the lgtm label, please assign carlbraganza for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 13, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @kaovilai. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Nov 13, 2025
@kaovilai
Copy link
Author

CSI Snapshot Metadata Container Readiness Probe Fails on ARM64

Problem:

  • The csi-snapshot-metadata container's readiness probe fails with "Exec format error"
  • The grpc_health_probe binary in the container image is built for AMD64, not ARM64
  • This prevents the container from becoming "Ready" (8/9 containers ready)

Impact: Low - Container is functional despite readiness probe failure; CBT APIs work correctly

Evidence:

$ kubectl describe pod csi-hostpathplugin-0 -n default
Warning  Unhealthy  Readiness probe failed: exec: Exec format error

Root Cause:
Upstream CSI driver images (gcr.io/k8s-staging-sig-storage/csi-snapshot-metadata:canary) are primarily built for AMD64

Refactors the Dockerfile to build the csi-snapshot-metadata binary
from source in the builder stage, similar to how grpc_health_probe
is obtained (via ADD from remote URL).

Changes to cmd/csi-snapshot-metadata/Dockerfile:
- Use golang:1.24-alpine as builder base (was alpine)
- Add TARGETOS and LDFLAGS build arguments
- Copy Go source files (go.mod, go.sum, cmd/, pkg/, client/, vendor/)
- Build binary in builder with cross-compilation support
- Copy built binary from builder (was copying from host filesystem)
- Remove obsolete binary ARG

Changes to release-tools/build.make:
- Replace --build-arg binary with --build-arg LDFLAGS
- Remove build-% dependency from push-multiarch-% target
  (Docker now builds from source, eliminating duplicate compilation)
- Fix manifest error detection to handle both error formats:
  - "manifest for ... not found" (docker.io)
  - "manifest unknown" (ghcr.io, quay.io)

Benefits:
- Self-contained builds (no external Makefile dependency for Docker)
- True cross-compilation via Docker buildx
- Consistent approach (both binaries prepared in builder stage)
- Easier for contributors (just docker/podman build)

Tested with multiarch build to ghcr.io (amd64, arm64).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@kaovilai kaovilai force-pushed the multiarch-grpc-health-probe branch from 0fb6f90 to 0e03558 Compare November 19, 2025 06:44
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Nov 19, 2025
@Rakshith-R
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 26, 2025
@k8s-ci-robot
Copy link
Contributor

@kaovilai: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-csi-external-snapshot-metadata-unit 0e03558 link true /test pull-kubernetes-csi-external-snapshot-metadata-unit

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@jsafrane
Copy link
Contributor

The released snapshot-metadata sidecar images are already multiarch, built by Makefile in this repository. What exactly are you trying to achieve?

$ docker manifest inspect registry.k8s.io/sig-storage/csi-snapshot-metadata:v0.2.0
{
   "schemaVersion": 2,
   "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
   "manifests": [
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 3235,
         "digest": "sha256:08eeefe99d22c9893c5bf694d6c719a7bc08601fd55b09bbe86fde72de8a41b7",
         "platform": {
            "architecture": "amd64",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 3235,
         "digest": "sha256:96df09d94f4513b513c83d52bd65f3fade8c269fd7529fe47548b4fcd47d6ce4",
         "platform": {
            "architecture": "arm",
            "os": "linux",
            "variant": "v7"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 3235,
         "digest": "sha256:cdee9fdebb7e69446014537e63ec1485ad111f3d344d6a5a54b356fc4711eb3c",
         "platform": {
            "architecture": "arm",
            "os": "linux",
            "variant": "v7"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 3235,
         "digest": "sha256:ebbc2cb3aff245d7450f986389bcebd48726035d36f14736a88c26c68cb89d8d",
         "platform": {
            "architecture": "arm64",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 3235,
         "digest": "sha256:a935233cad5e844cdaa8b54e9a2bb80f0c3c6b9cfc5195612f5d790228c35774",
         "platform": {
            "architecture": "ppc64le",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 3235,
         "digest": "sha256:f06da751ce627bc1a0f300b0ee4e524cc5e86ca4036921a4b68396785e2989d9",
         "platform": {
            "architecture": "s390x",
            "os": "linux"
         }
      }
   ]
}

@jsafrane
Copy link
Contributor

Upstream CSI driver images (gcr.io/k8s-staging-sig-storage/csi-snapshot-metadata:canary) are primarily built for AMD64

No, csi-snapshot-metadata:canary is available as s390x, ppc64le, arm64, arm and amd64

@kaovilai
Copy link
Author

@jsafrane I want to remove hardcoded amd64 for one from the Dockerfile.

ADD https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-amd64 /bin/grpc_health_probe

The main change per pr description is

  1. Multiarch grpc_health_probe support
    Add TARGETARCH build argument
    Use grpc_health_probe-linux-${TARGETARCH} instead of hardcoded amd64
    Add --platform=$BUILDPLATFORM to builder stage for cross-compilation

The rest is to speed up build make it possible to build anywhere with just docker/podman build. Making all bin builds happen inside the container instead of outside.

@kaovilai
Copy link
Author

Without my change health probe WILL FAIL via exec format error on non amd64 cluster. I know cause I ran it from said "already multiarch" image.

@kaovilai
Copy link
Author

$ kubectl describe pod csi-hostpathplugin-0 -n default
Warning  Unhealthy  Readiness probe failed: exec: Exec format error

@kaovilai
Copy link
Author

kaovilai commented Nov 28, 2025

Deps management are much cleaner and more easily pluggable into CI systems or other container build automation (such as quay.io auto build on push, where build env do not know anything about makefile crossbuild bins) if all the requirements are done in a container instead of cross-build in Makefile then copy-built-bins-into-container.

@jsafrane
Copy link
Contributor

Oh, I see, this image includes a binary from alpine! That's bad, please remove it. Such binary does not go through any CVE scanning. We in Kubernetes make a huge effort not to include any 3rd party binaries in our images that we don't control.

Try to use a different health probe - e.g. a simple web server with /health endpoint, together with /metrics.

BTW, it would be better not to include huge wall of AI generated text and write something significantly shorter where information does not get lost.

@kaovilai
Copy link
Author

Wtal. Thanks

@kaovilai
Copy link
Author

this image includes a binary from alpine!

Do you mean alpine as well as golang:1.24-alpine should not be used as a builder image?

Would golang:1.24 work better? or golang:1.24-bookworm
will check other k8s repos.

@jsafrane
Copy link
Contributor

Do not use any builder image. make container builds the binary locally (or on our build farm) and Dockerfile copies the sidecar into the final image. And only the sidecar, no grpc_health_probe.

@kaovilai
Copy link
Author

Even if its a multi-stage build? the build image isn't included in final container image, only the built binary on current distroless.

@kaovilai
Copy link
Author

kaovilai commented Nov 29, 2025

How can I trust k8s buildfarm or local build environment any more than the (imo more verifiable) build image?

@jsafrane
Copy link
Contributor

jsafrane commented Dec 1, 2025

Even if its a multi-stage build?

Why do you need a multi-stage build? This sidecar is a super simple go binary.

How can I trust k8s buildfarm or local build environment any more than the (imo more verifiable) build image?

There is a Kubernetes SIG-testinfra and SIG-release teams that keep the build farm up to date. If you want to challenge their skills or competences, please go to the corresponding teams.

By using a builder image as alpine, the binary will be compiled by a different go version than is used for unit tests and even e2e test.

@kaovilai
Copy link
Author

kaovilai commented Dec 1, 2025

Why do you need a multi-stage build?

It was already multi-stage prior to my PR.

It also solves following.

this image includes a binary from alpine! That's bad, please remove it.

binary will be compiled by a different go version than is used for unit tests and even e2e test.

Assuming version is the go module directive, or toolchain directive, we can easily reuse values from there here.

So summarizing the requested changes:

  • no builder image (remove "builder" grpc_health_probe download part)
  • build everything outside image (ie Makefile crossbuild target), copy into distroless single stage build (refactor away from prior to PR multi-stage)
  • In cmd/csi-snapshot-metadata/main.go (or wherever main lives), add a tiny net/http server: register /health (just returns 200 when ready) and /metrics (either stub or Prometheus handler if you wire it).​
  • Make that HTTP server listen on a fixed port (e.g. 8080) in a goroutine while the main gRPC server continues as today, so the same binary exposes both gRPC and HTTP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants